Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.

نویسندگان

  • Chern-Sing Goh
  • Ning Lan
  • Shawn M Douglas
  • Baolin Wu
  • Nathaniel Echols
  • Andrew Smith
  • Duncan Milburn
  • Gaetano T Montelione
  • Hongyu Zhao
  • Mark Gerstein
چکیده

Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of active sites for protein structures from computed chemical properties

MOTIVATION Identification of functional information for a protein from its three-dimensional (3D) structure is a major challenge in genomics. The power of theoretical microscopic titration curves (THEMATICS), when coupled with a statistical analysis, provides a method for high-throughput screening for identification of catalytic sites and binding sites with high accuracy and precision. The meth...

متن کامل

Protein therapeutics: promises and challenges for the 21st century.

Recent advances in massively parallel experimental and computational technologies are leading to radically new approaches to the early phases of the drug production pipeline. The revolution in DNA microarray technologies and the imminent emergence of its analogue for proteins, along with machine learning algorithms, promise rapid acceleration in the identification of potential drug targets, and...

متن کامل

Higher-throughput approaches to crystallization and crystal structure determination.

In recent times, there has been a large increase in the number of protein structures deposited in the Protein Data Bank. Structural genomics initiatives have contributed to this expansion through their focus on high-throughput structural determination. This has fuelled advances in many of the techniques in the pipeline from gene to protein to crystal to structure. These include ligation-indepen...

متن کامل

The JCSG high-throughput structural biology pipeline

The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverag...

متن کامل

Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline.

Structural genomics is emerging as a principal approach to define protein structure-function relationships. To apply this approach on a genomic scale, novel methods and technologies must be developed to determine large numbers of structures. We describe the design and implementation of a high-throughput structural genomics pipeline and its application to the proteome of the thermophilic bacteri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of molecular biology

دوره 336 1  شماره 

صفحات  -

تاریخ انتشار 2004